Search CORE

125 research outputs found

Trajectory-Based Off-Policy Deep Reinforcement Learning

Author: Daniel Christian
Doerr Andreas
Toussaint Marc
Trimpe Sebastian
Volpp Michael
Publication venue
Publication date: 14/05/2019
Field of study

Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks. However, these methods are also data-inefficient, afflicted with high variance gradient estimates, and frequently get stuck in local optima. This work addresses these weaknesses by combining recent improvements in the reuse of off-policy data and exploration in parameter space with deterministic behavioral policies. The resulting objective is amenable to standard neural network optimization strategies like stochastic gradient descent or stochastic gradient Hamiltonian Monte Carlo. Incorporation of previous rollouts via importance sampling greatly improves data-efficiency, whilst stochastic optimization schemes facilitate the escape from local optima. We evaluate the proposed approach on a series of continuous control benchmark tasks. The results show that the proposed algorithm is able to successfully and reliably learn solutions using fewer system interactions than standard policy gradient methods.Comment: Includes appendix. Accepted for ICML 201

arXiv.org e-Print Archive

MPG.PuRe

Model-Based Policy Search for Automatic Tuning of Multivariate PID Controllers

Author: Doerr Andreas
Marco Alonso
Nguyen-Tuong Duy
Schaal Stefan
Trimpe Sebastian
Publication venue
Publication date: 08/03/2017
Field of study

PID control architectures are widely used in industrial applications. Despite their low number of open parameters, tuning multiple, coupled PID controllers can become tedious in practice. In this paper, we extend PILCO, a model-based policy search framework, to automatically tune multivariate PID controllers purely based on data observed on an otherwise unknown system. The system's state is extended appropriately to frame the PID policy as a static state feedback policy. This renders PID tuning possible as the solution of a finite horizon optimal control problem without further a priori knowledge. The framework is applied to the task of balancing an inverted pendulum on a seven degree-of-freedom robotic arm, thereby demonstrating its capabilities of fast and data-efficient policy learning, even on complex real world problems.Comment: Accepted final version to appear in 2017 IEEE International Conference on Robotics and Automation (ICRA

arXiv.org e-Print Archive

Crossref

Policy search for imitation learning

Author: Doerr Andreas
Publication venue
Publication date: 01/01/2015
Field of study

Efficient motion planning and possibilities for non-experts to teach new motion primitives are key components for a new generation of robotic systems. In order to be applicable beyond the well-defined context of laboratories and the fixed settings of industrial factories, those machines have to be easily programmable, adapt to dynamic environments and learn and acquire new skills autonomously. Reinforcement learning in principle solves those learning issues but suffers from the curse of dimensionality. When dealing with complex environments and highly agile hardware platforms like humanoid robots in large or possibly continuous state and action spaces, the reinforcement framework becomes computationally infeasible. In recent publications, parametrized policies have been employed to face this problem. One of them, Policy Improvement with Path Integrals (PI^2), has been derived from the transformation of the Hamilton-Jacobi-Bellman (HJB) equation of stochastic optimal control into a path integral using the Feynmann Kac theorem. Applications of PI^2 are so far limited to Dynamic Movement Primitives (DMP) to parametrize the motion policy. Another policy parametrization, the formulation of motion primitives as solution of an optimization-based planner has been widely used in other fields (e.g. inverse optimal control) and offers compelling possibilities to formulate characteristic parts of a motion in an abstract sense without specifying too much problem-specific geometry. Imitation learning or learning from demonstration can be seen as a way to bootstrap the acquisition of new behavior and as an efficient way to guide the policy search into a desired direction. Nevertheless, due to imperfect demonstrations, which might be incomplete or contradictory and also due to noise, the learned behavior might be insufficient. As observed in the animal kingdom, a final trial-and-error phase guided by the cost and reward of a specific behavior is necessary to obtain a successful behavior. Interestingly, the reinforcement learning framework might offer the tools to govern both learning methods at the same time. Imitation learning can be reformulated as reinforcement learning under a specific reward function, allowing the combination of both learning methods. In this work, the concept of probability-weighted averaging of policy roll-outs as seen in PI^2 is combined with an optimization-based policy representation. The reinforcement learning toolbox and direct policy search is utilized in a way that allows both imitation learning based on arbitrary demonstration types and the imposition of additional objectives on the learned behavior. A black box evolutionary algorithm, Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES), which can be shown to be closely related to the approach in PI2 is leveraged to explore the parameter space. This work will experimentally evaluate the suitability of this algorithm for learning motion behavior on a humanoid upper body robotic system. We will focus on learning from different types of demonstrations. The formulation of the reward function for reinforcement learning will be depicted and multiple test scenarios in 2D and 3D will be presented. Finally, the capability of this approach to learn and improve motion primitives is demonstrated on a real robotic system within an obstacle test scenario

MPG.PuRe

Probabilistic Recurrent State-Space Models

Author: Daniel Christian
Doerr Andreas
Nguyen-Tuong Duy
Schaal Stefan
Schiegg Martin
Toussaint Marc
Trimpe Sebastian
Publication venue
Publication date: 01/01/2018
Field of study

State-space models (SSMs) are a highly expressive model class for learning patterns in time series data and for system identification. Deterministic versions of SSMs (e.g. LSTMs) proved extremely successful in modeling complex time series data. Fully probabilistic SSMs, however, are often found hard to train, even for smaller problems. To overcome this limitation, we propose a novel model formulation and a scalable training algorithm based on doubly stochastic variational inference and Gaussian processes. In contrast to existing work, the proposed variational approximation allows one to fully capture the latent state temporal correlations. These correlations are the key to robust training. The effectiveness of the proposed PR-SSM is evaluated on a set of real-world benchmark datasets in comparison to state-of-the-art probabilistic model learning methods. Scalability and robustness are demonstrated on a high dimensional problem

arXiv.org e-Print Archive

MPG.PuRe

Разработка информационной модели данных системы поощрений сотрудников и студентов

Author: Aicher Alexandra
Brümmendorf Tim Henrik
Dimmeler Stefanie
Doerr Hans W.
Hoffmann Jedrzej
Spyridopoulos Ioakim
Zeiher Andreas M.
Publication venue
Publication date: 01/01/2009
Field of study

В статье рассмотрена система поощрений сотрудников и студентов. Составлена диаграмма сущность-связь процесса учета всех этапов документооборота данного процесса. Представлен пример формы разработанной информационной системы учета и анализа распределения поощрений сотрудниками студентам.The article considers the system of incentives for employees and students. A diagram is drawn of the essence-relationship of the process of accounting for all stages of the workflow of this process. An example of the form of the developed information system of the account and the analysis of distribution of encouragements by employees to students is presented

Electronic archive of Tomsk Polytechnic University

A Temporary Pause in the Replication Licensing Restriction Leads to Rereplication during Early Human Cell Differentiation

Author: Abu-Halima Masood
Doerr Julia
Du Yiqing
Fischer Ulrike
Isted Christina
Keller Andreas
Ludwig Nicole
Meese Eckart
Minet Marie
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/03/2022
Field of study

Gene amplifications in amphibians and flies are known to occur during development and have been well characterized, unlike in mammalian cells, where they are predominantly investigated as an attribute of tumors. Recently, we first described gene amplifications in human and mouse neural stem cells, myoblasts, and mesenchymal stem cells during differentiation. The mechanism leading to gene amplifications in amphibians and flies depends on endocycles and multiple origin-firings. So far, there is no knowledge about a comparable mechanism in normal human cells. Here, we describe rereplication during the early myotube differentiation of human skeletal myoblast cells, using fiber combing and pulse-treatment with EdU (50 -Ethynyl-20 -deoxyuridine)/CldU (5-Chlor-20 - deoxyuridine) and IdU (5-Iodo-20 -deoxyuridine)/CldU. We found rereplication during a restricted time window between 2 h and 8 h after differentiation induction. Rereplication was detected in cells simultaneously with the amplification of the MDM2 gene. Our findings support rereplication as a mechanism enabling gene amplification in normal human cells

Directory of Open Access Journals

PubMed Central

Universaar

Acronym

Design and implementation of a platform for hyperconnected cyber physical systems

Author: Baccelli Emmanuel
Doerr Joerg
Kikuchi Shinji
Morgenstern Andreas
Schleiser Kaspar
Thomas Ian
Publication venue: 'Elsevier BV'
Publication date: 01/10/2018
Field of study

International audienceThe Internet of Things (IoT) is an area of growing importance as more and more computing capability becomes embedded into real world objects and environments. But at the same time IoT is just one component of a widespread shift towards a new age of federation, combining with other trends such as cloud computing, blockchain and automation to create a new hyperconnected infrastructure. This infrastructure will emerge from the convergence of traditional, cloud and IoT-based models of computing, creating a more decentralised, secure and democratic computing platform for the future. But while bringing significant benefits, federation also brings significant problems-in particular the complexity of building, integrating and managing systems built using highly distributed and heterogeneous platforms. In this paper we discuss our work on modelling, deployment and management for this new converged computing environment, leveraging previous work on domain languages, cloud computing and the Web of Things to accelerate and democratize the development of real world hyperconnected systems

INRIA a CCSD electronic archive server

Persisting right-sided chylothorax in a patient with chronic lymphocytic leukemia: a case report

Author: Andreas Mackensen
BA Staats
BC Marts
Bernd M Spriewald
CH Doerr
CH Doerr
CH Doerr
DC Mares
DM Bethencourt
DW Johnson
EA Aranda
EE McGrath
F Maldonado
FL Ampil
Godehard A Scholz
Horia Sirbu
J Gerstein
Katharina Anders
MG Alexandrakis
MK Ferguson
O Zimhony
RS Lampson
Sabine Semrau
SK Nair
TW Rice
VG Valentine
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Introduction Chylothorax caused by chronic lymphocytic leukemia is very rare and the best therapeutic approach, especially the role of modern immunochemotherapy, is not yet defined. Case presentation We present the case of a 65-year-old male Caucasian patient with right-sided chylothorax caused by a concomitantly diagnosed chronic lymphocytic leukemia. As first-line treatment four cycles of an immunochemotherapy, consisting of fludarabine, cyclophosphamide and rituximab were administered. In addition, our patient received total parenteral nutrition for the first two weeks of treatment. Despite the very good clinical response of the lymphoma to treatment, the chylothorax persisted and percutaneous radiotherapy of the thoracic duct was applied. However, eight weeks after the radiotherapy the chylothorax still persisted and our patient agreed to a surgical intervention. A ligation of the thoracic duct via a muscle sparing thoracotomy was performed, resulting in a complete cessation of the pleural effusion. Apart from the first two weeks our patient was treated on an out-patient basis for nearly six months. Conclusion In this case of chylothorax caused by chronic lymphocytic leukemia, immunochemotherapy in combination with conservative treatment, and even consecutive radiotherapy, were not able to stop pleural effusion, despite the very good clinical response of the chronic lymphocytic leukemia to treatment. Out-patient management using repetitive thoracocenteses can be safe as bridging until definitive surgical ligation of the thoracic duct

Crossref

Springer - Publisher Connector

PubMed Central

Reprogramming Low-end IoT Devices from the Cloud

Author: Baccelli Emmanuel
Doerr Joerg
Jallouli Ons
Kikuchi Shinji
Morgenstern Andreas
Padilla Francisco,
Schleiser Kaspar
Thomas Ian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/07/2018
Field of study

International audienceThe Internet of Things (IoT) consists in a variety of smart connected objects, among which a category of low-end devices based on micro-controllers. The orchestration of low-end IoT devices is not straightforward because of the lack of generic and holistic solutions articulating cloud-based tools on one hand, and low-end IoT device software on the other hand. In this paper, we describe such a solution, combining a cloud-based IDE, graphical programming, and automatic JavaScript generation. Scripts are pushed over the Internet and over-the-air for the last hop, updating runtime containers hosted on heterogeneous low-end IoT devices running RIOT. We demonstrate a prototype working on common off-the-shelf low-end IoT hardware with as little as 32kB of memory

INRIA a CCSD electronic archive server

Fraunhofer-ePrints

Rich Magnetic Phase Diagram of Putative Helimagnet Sr $_3$ Fe $_2$ O $_7$

Author: Andriushin Nikita D.
Doerr Mathias
Granovsky Sergey
Grumbach Justus
Hoser Andreas
Inosov Dmytro S.
Jain Anil
Keimer Bernhard
Kim Jung-Hwa
MacFarlane W. Andrew
Maljuk Andrey
Ollivier Jacques
Onykiienko Yevhen A.
Peets Darren C.
Pomjakushin Vladimir
Reehuis Manfred
Tymoshenko Yuliia V.
Publication venue
Publication date: 14/11/2023
Field of study

The cubic perovskite SrFeO

_3

was recently reported to host hedgehog- and skyrmion-lattice phases in a highly symmetric crystal structure which does not support the Dzyaloshinskii-Moriya interactions commonly invoked to explain such magnetic order. Hints of a complex magnetic phase diagram have also recently been found in powder samples of the single-layer Ruddlesden-Popper analog Sr

_2

FeO

_4

, so a reinvestigation of the bilayer material Sr

_3

_2

_7

, believed to be a simple helimagnet, is called for. Our magnetization and dilatometry studies reveal a rich magnetic phase diagram with at least 6 distinct magnetically ordered phases and strong similarities to that of SrFeO

_3

. In particular, at least one phase is apparently multiple-

\mathbf{q}

, and the

\mathbf{q}

s are not observed to vary among the phases. Since Sr

_3

_2

_7

has only two possible orientations for its propagation vector, some of the phases are likely exotic multiple-

\mathbf{q}

order, and it is possible to fully detwin all phases and more readily access their exotic physics.Comment: 14 pages, 13 figure

arXiv.org e-Print Archive